Generating data as a proxy for unavailable corpus data: the contextualized sentence completion task

نویسنده

  • Joan Bresnan
چکیده

There is much interest in using large corpora to explore predictors of the probability of higher level linguistic structures, but suitable corpora are not available for all languages and their varieties. We explore a task that uses discourse contexts from an existing corpus as prompts for sentence completion to investigate the usefulness of the method for generating data as a proxy for unavailable corpus data. Mini databases of dative and genitive structures were obtained with the method using American and Australian participants. It is shown that the databases are indeed a good proxy for corpus data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

برچسب‌زنی نقش معنایی جملات فارسی با رویکرد یادگیری مبتنی بر حافظه

Abstract Extracting semantic roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a semantic role labeling system for Persian, using memory-based learning model and standard features. Our proposed system implements a two-phase architecture to first identify...

متن کامل

Parallel Corpus Refinement as an Outlier Detection Algorithm

Filtering noisy parallel corpora or removing mistranslations out of training sets can improve the quality of a statistical machine translation. Discriminative methods for filtering the corpora such as a maximum entropy model, need properly labeled training data, which are usually unavailable. Generating all possible sentence pairs (the Cartesian product) to generate labeled data, produces an im...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

بررسی الگوهای ذهنی طرحواره‌ای کمال‌گرایی و تأیید خواهی در افسردگی

AbstractObjectives: The purpose of this research is to investigate two different perspectives on depressive thinking. One viewpoint considers depression as a reflection of increasing general accessibility of negative constructs and depressive memories the other defines depressive thoughts as a reflection of changes at a more general level of cognitive representation. Method: 54 subjects selecte...

متن کامل

بررسی روش های ارزیابی صرف زمان فعل و تعیین بهترین روش در کودکان 3 و 4 ساله شهر رشت در سال 1393

Introduction: one domain of morphology is inflection that adds syntactic considerations to the words. This domain is affected in individual with language disorders. So evaluation of inflection in these people is important. In this study, methods of verb tense inflection evaluation were compared and the best method was determined. Methods: This study was descriptive-analytical. The participa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015